where $x_i$ is the i-th component of real vector $\vec{x}$, $A_{ij}$ and $D^{1,2}_{ij}$ are the $(i,j)$ component of matrix $\mathbf{A}$ and $\mathbf{D^{1,2}}$. $A_{ij}$ is complex number whereas $D^{1,2}_{ij}$ are real number. $A^{*}_{ij}$ is used to denote the conjugate of $A_{ij}$.
$\mathbf{A}$ refers to the transformation operator from the real dictionary field to the observed complex shear field. Note that we do not consider $B$ mode so $x$ is a real vector.
# sparseBase.massmapSparsityTask.__init__
## Initialization of the Dictionary Space (A)
# 1) Basis vector used to model the Surface Density of halo
if dicname=='starlet':
from halolet import starlet2D
self.dict2D = starlet2D(gen=2,nframe=self.nframe,ny=self.ny,nx=self.nx)
elif dicname=='nfwlet':
from halolet import nfwlet2D
self.dict2D = nfwlet2D(nframe=self.nframe,ngrid=self.nx,smooth_scale=-1)
# 2) From Surface Density of Excess Surface Density
self.ks2D = massmap_ks2D(self.ny,self.nx)
# 3) Lening Kernel: from Excess Surface Density to Shear
if self.nlp<=1:
self.lensKernel = np.ones((self.nz,self.nlp))
else:
self.lensKernel = np.zeros((self.nz,self.nlp))
for i,zs in enumerate(zsbin):
self.lensKernel[i,:] = self.cosmo.deltacritinv(zlbin,zs)
$\mathbf{D^{1,2}}$ refer to difference operators in two transverse (ra, dec) directions. $\lambda_2$ control the amplitude of the Total Square Variance (TSV) term.
# sparseBase.massmapSparsityTask.gradient_TSV
# Definition of D1 and D2
difx = np.roll(self.alphaR[:,0,:,:],1,axis=-1)
difx = difx-self.alphaR[:,0,:,:] # D1
dify = np.roll(self.alphaR[:,0,:,:],1,axis=-2)
dify = dify-self.alphaR[:,0,:,:] # D2
We separate out the second order terms in the loss function.
$$F(\vec{x})=\frac{1}{2}(y_{i}-A^{*}_{ij}x_{j})(y_{i}-A_{ik}x_{k})+\frac{\lambda_2}{2}[(D^{1}_{ij} x_{j})(D^{1}_{ik}x_{k})+ (D^{2}_{ij} x_{j})(D^{2}_{ik}x_{k})].$$The gradient of it is
$$\partial_\alpha F(\vec{x})=-A_{i\alpha}^{*} (y_{i}-A_{ij}x_{j}) + \lambda_2(D^1_{i\alpha}D^1_{ij} x_{j}+ D^2_{i\alpha} D^2_{ij} x_j).$$# sparseBase.massmapSparsityTask.gradient
# calculate the gradient of the Second order component in loss function
# wihch includes Total Square Variance(TSV) and chi2 components
gCh2=self.gradient_chi2()
gTSV=self.gradient_TSV()
# sparseBase.massmapSparsityTask.gradient_chi2
# calculate the gradient of the chi2 component
shearRTmp = self.main_forward(self.alphaR) #A_{ij} x_j
self.shearRRes = self.shearR-shearRTmp #y_i-A_{ij} x_j
dalphaR = -self.main_transpose(self.shearRRes)/2. #-A_{i\alpha}(y_i-A_{ij} x_j)/2
# sparseBase.massmapSparsityTask.gradient_TSV
gradx = np.roll(difx,-1,axis=-1)
gradx = gradx-difx # (D1)_{i\alpha} (D1)_{ij} x_j
grady = np.roll(dify,-1,axis=-2)
grady = grady-dify # (D2)_{i\alpha} (D2)_{ij} x_j
dalphaR[:,0,:,:]=(gradx+grady)*self.lbd2/2.
The starting point is set to $x_i^{(0)}=0$. We approach to the minimum of the loss function ($L(\vec{x})$) by iteratively updates $\vec{x}$. For each iteration, we find the coordinate on which the projection of derivative has the largest signal to noise ratio. This coordinate is recorded as $x_{\alpha}$ and the largest signal to noise ratio is recorded as $\rm{S/N}_{\alpha}$. We update $\lambda=\rm{max}(0.98 \rm{S/N}_{\alpha}, \lambda_1)$. Then we update along this specific coordinate $x_{\alpha}$.
$$x_{\alpha}^{(n+1)}=S_{\lambda\sigma_{\alpha}}(x_{\alpha}^{(n)}-\frac{\partial_{\alpha}F(\vec{x})}{A_{i\alpha}A_{i\alpha}+4\lambda_2}),$$$S_{\lambda}$ is the soft thresholding function defined as:
$$S_{\lambda} (x_{\alpha})=\rm{sign}(x_{\alpha}) \rm{max}(|x_{\alpha}| - \lambda,0)$$self.dalphaR = -self.mu*self.gradient().real
# Update thresholds
snrArray=np.abs(self.dalphaR)/self.sigmaA
ind=np.unravel_index(np.argmax(snrArray, axis=None), snrArray.shape)
if snrArray[ind]<self.lbd:
self.lbd_path[irun:]=self.lbd
return
elif snrArray[ind]>1.02*self.lbd_path[irun-1]:
# do not update threshold if the maximum is larger than lbd
# record it to lbd_path
self.lbd_path[irun]=self.lbd_path[irun-1]
else:
# decrease threshold
self.lbd_path[irun]=max(snrArray[ind]*0.98,self.lbd)
self.thresholds=self.lbd_path[irun]*self.sigmaA
dum = self.alphaR+self.dalphaR
if threM=='ST':
self.alphaR[ind] = soft_thresholding(dum,self.thresholds)[ind]
elif threM=='FT':
self.alphaR[ind] = firm_thresholding(dum,self.thresholds)[ind]
Note that here $A_{i\alpha} A_{i\alpha}$ should be estimated taking into account the mask of observed shear field.
# sparseBase.massmapSparsityTask.spectrum_norm
# Estimate A_{i\alpha} A_{i\alpha}
asquareframe=np.zeros((self.nz,self.nframe,self.ny,self.nx))
for iz in range(self.nz):
maskF=np.fft.fft2(self.mask[iz,:,:])
for iframe in range(self.nframe):
fun=np.abs(self.ks2D.transform(self.dict2D.fouaframes[iframe,:,:],outFou=False))**2.
asquareframe[iz,iframe,:,:]=np.fft.ifft2(np.fft.fft2(fun).real*maskF).real
spectrum=np.sum(self.lensKernel[:,:,None,None,None]*asquareframe[:,None,:,:,:],axis=0)
self.mu=1./(spectrum+4.*self.lbd2)
The starting point is set to $x_i^{(0)}=0$. We approach to the minimum of the loss function ($L(\vec{x})$) by iteratively updates $\vec{x}$. For each iteration, we update along the gradient of $F(\vec{x})$ and each coordinate ($x_j$) of $\vec{x}$ is updated as follows:
$$x_{j}^{(n+1)}=S_{\lambda_1 \sigma_{j}}(x_{j}^{(n)}- \mu \partial_{j} F(\vec{x}) ),$$$\mu$ is the step size of the iteration. $S_{\lambda}$ is the soft thresholding function
$$t^{(0)}=0$$$$t^{(n+1)}= \frac{1+\sqrt{1+4(t^{(n)})^2}}{2}$$$$\vec{x}^{(n+1)}=\vec{x}^{(n+1)}+\frac{t^{(n)} -1}{t^{(n+1)}} (\vec{x}^{(n+1)}-\vec{x}^{(n)})$$dalphaR = -self.mu*self.gradient().real
dum = self.alphaR.real+dalphaR
if threM=='FT':
dum = firm_thresholding(dum,self.thresholds)
elif threM=='ST':
dum = soft_thresholding(dum,self.thresholds)
#update x_\alpha according to FISTA
tnTmp= (1.+np.sqrt(1.+4*tn**2.))/2.
ratio= (tn-1.)/tnTmp
tn = tnTmp
self.alphaR=dum+(ratio*(dum-self.alphaR))
Note that the step size $\mu$ should be less or equal to the reciprocal of the largest eigenvalue of the matrix $A_{ij}A_{jk}$. We estimate $\mu$ by simulating large number of random vectors and apply operator $A_{ij}A_{jk}$ to these random vecotrs
norm=0.
for irun in range(100):
np.random.seed(irun)
alphaTmp= np.random.randn(self.nlp,self.nframe,self.ny,self.nx)
normTmp = np.sqrt(np.sum(alphaTmp**2.))
alphaTmp= alphaTmp/normTmp
shearTmp= self.main_forward(alphaTmp)
alphaTmp2= self.main_transpose(shearTmp)
normTmp2= np.sqrt(np.sum(alphaTmp2**2.))
if normTmp2>norm:
norm=normTmp2
self.mu = 1./norm/1.1
We should use the path-wise coordinate descent algorithm otherwise the reconstructed structures are significantly blurred in the line-of-sight direction.
The reason is that the lensing kernel for different lens planes are highly correlated so the basis vectors are highly nonorthogonal.